Goto

Collaborating Authors

 stochastic search and adaptive momentum


Optimal Stochastic Search and Adaptive Momentum

Neural Information Processing Systems

Stochastic optimization algorithms typically use learning rate schedules that behave asymptotically as J.t(t) J.to/t. The ensem(cid:173) ble dynamics (Leen and Moody, 1993) for such algorithms provides an easy path to results on mean squared weight error and asymp(cid:173) totic normality. We apply this approach to stochastic gradient algorithms with momentum. We show that at late times, learning is governed by an effective learning rate J.tejJ J.to/(l - f3) where f3 is the momentum parameter. We describe the behavior of the asymptotic weight error and give conditions on J.tejJ that insure optimal convergence speed.